Sequence-Level Knowledge Distillation

نویسندگان

Yoon Kim

Alexander M. Rush

چکیده

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neural models in other domains to the problem of NMT. We demonstrate that standard knowledge distillation applied to word-level prediction can be effective for NMT, and also introduce two novel sequence-level versions of knowledge distillation that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search (even when applied on the original teacher model). Our best student model runs 10 times faster than its state-of-the-art teacher with only a decrease of 0.2 BLEU. It is also significantly better than a baseline model trained without knowledge distillation: by 4.2/1.7 BLEU with greedy decoding/beam search.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Three-component Distillation Columns Sequencing: Including Configurations with Divided-wall Columns

In the present work, the exergy analysis and economic study of 3 different samples of threecomponent mixtures have been investigated (ESI>1, ESI≈1, and ESI<1). The feed mixture has been tested under three different compositions (low, equal, and high contents of the intermediate component). A quantitative comparison between simple and complex configurations, considering thermally coupled, thermo...

متن کامل

The Design and Optimization of Distillation Column with Heat and Power Integrated Systems

Based on two integration steps, an optimization framework is proposed in this work for the synthesis and design of complex distillation sequence. The first step is to employ heat integration in sequence and reduce the heat consumption and total annual cost of the process. The second one is to increase the exergetic efficiency of sequence by generating power in implemented expanders in sequence....

متن کامل

Logic Based Algorithms for the Rigorous Design of Thermally Coupled Distillation Sequences

This paper presents an algorithm for the rigorous design of thermally coupled distillation sequences using process simulators. First we show that the two side streams connections that produce a thermal ‘couple’ can be substituted by a combination of a material stream and a heat flow. In this way, the sequence of thermally coupled distillation columns can be simulated without recycle streams. Th...

متن کامل

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

The process for transferring knowledge of multiple reinforcement learning policies into a single multi-task policy via distillation technique is known as policy distillation. When policy distillation is under a deep reinforcement learning setting, due to the giant parameter size and the huge state space for each task domain, it requires extensive computational efforts to train the multi-task po...

متن کامل

Single Pot Sequential Crystallization-Distillation as a New Purification Procedure

A new purification procedure exploiting the simultaneous presence of a solid, liquid, and gas phase in a low surface area system is proposed and discussed. The assumptions of vanishingly low diffusion coefficients in the solid phase and that of the presence of a single “effective impurity” allow to plan the sequence of operations starting from the knowledge of just the melting and boiling point...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Sequence-Level Knowledge Distillation

نویسندگان

چکیده

منابع مشابه

Three-component Distillation Columns Sequencing: Including Configurations with Divided-wall Columns

The Design and Optimization of Distillation Column with Heat and Power Integrated Systems

Logic Based Algorithms for the Rigorous Design of Thermally Coupled Distillation Sequences

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Single Pot Sequential Crystallization-Distillation as a New Purification Procedure

عنوان ژورنال:

اشتراک گذاری